Cream of the Crop 3

home *** CD-ROM | disk | FTP | other *** search

/ Cream of the Crop 3 / Cream of the Crop 3.iso / utility / sfs100.zip / SFS2.DOC < prev next >

Wrap

Text File | 1994-02-20 | 117KB | 2,046 lines

The Care and Feeding of Passwords --------------------------------- With the inherent strength of an encryption system like the one used by SFS, the password used for encryption is becoming more the focus of attack than the encryption system itself. The reason for this is that trying to guess an encryption password is far simpler than trying to break the encryption system. SFS allows keys of up to 100 characters in length. These keys can contain letters, numbers, spaces, punctuation, and most control and extended characters except backspace (which is used for editing), escape (which is used to abort the password entry), and carriage return or newline, which are used to signify the end of the password. This fact should be made use of to the fullest, with preferred passwords being entire phrases rather than individual words (in fact since very few words are longer than the SFS absolute minimum password length of 10 characters, the complete set of these words can be checked in moments). There exist programs designed to allow high-speed password cracking of standard encryption algorithms which can, in a matter of hours (sometimes minutes, even seconds in the case of very weak algorithms), attempt to use the contents of a number of very large and complete dictionaries as sample passwords[1]. Virtually all passwords composed of single words can be broken with ease in this manner, even in the case of encryption methods like the one which is used by SFS, which has been specially designed to be resistant to this form of attack (doing a test of all possible 10-letter passwords assuming a worst-case situation in which the password contains lowercase letters only, can be accomplished in 450,000 years on a fast workstation (DEC Alpha) if the attacker knows the contents of the encrypted volume in advance - or about 4 1/2 years on a network of 100,000 of these machines). Of course no attacker would use this approach, as few people will use every possible combination of 10 letter passwords. By using an intelligent dictionary-based cracking program, this time can be reduced to only a few months. Complete programs which perform this task and libraries for incorporation into other software are already widely available. This problem is especially apparent if the encryption algorithm used is very weak - the encryption used by the popular Pkzip archiver, for example, can usually be broken in this manner in a few seconds on a cheap personal computer using the standard wordlist supplied with all Unix systems. Simple modifications to passwords should not be trusted. Capitalizing some letters, spelling the words backwards, adding one or two digits to the end, and so on, only present a slightly more difficult challenge to the average password-cracker than plain unadorned passwords. Any phrase which could be present in any kind of list (song lyrics, movie scripts, books, plays, poetry, famous sayings, and so on) should not be used - again, these can be easily and automatically checked by computers. Using foreign languages offers no extra security, since it means an attacker merely has to switch to using foreign-language dictionaries (or phrase lists, song lyrics, and so on). Relying on an attacker not knowing that a foreign language is being used ("If I use Swahili they'll never think of checking for it" - the so-called "Security through obscurity" technique) offers no extra security, since the few extra days or months it will take them to check every known language are only a minor inconvenience. Probably the most difficult passwords to crack are ones comprising unusual phrases or sentences, since instead of searching a small body of text like the contents of a dictionary, book, or phrase list, the cracker must search a much larger corpus of data, namely all possible phrases in the language being used. Needless to say, the use of common phrases should be avoided, since these will be an obvious target for crackers. Some sample bad passwords might be: misconception Found in a standard dictionary noitpecnocsim Reversed standard dictionary word miskonseption Simple misspelling of a standard word m1skon53pshun Not-so-simple misspelling of a standard word MiScONcepTiON Standard word with strange capitalization misconception1234 Standard word with simple numeric code appended 3012261726 Simple numeric code, probably a US phone number YKYBHTLWYS Simple mnemonic In general coming up with a secure single-word password is virtually impossible unless you have a very good memory for things like unique 20-digit numbers. Some sample bad passphrases might be: What has it got in its pocketses? Found in a common book Ph'n-glui mgl'w naf'h Cthulhu R'yleh w'gah Found in a somewhat less common book For yesterday the word of Caesar might have stood Found in a theatrical work modify the characteristics of a directory Found in a technical manual T'was brillig, and the slithy toves Found in a book of poetry I've travelled roads that lead to wonder Found in a list of music lyrics azetylenoszilliert in phaenomenaler kugelform Found in an obscure foreign journal Arl be back Found in several films I don't recall Associated with a famous person (although it does make a good answer to the question "What's the password?" during an interrogation) Footnote [1]: A large collection of dictionaries suitable for this kind of attack can be found on black.ox.ac.uk in the `wordlists' directory. These dictionaries contain, among other things, 2MB of Dutch words, 2MB of German words, 600KB of Italian words, 600KB of Norwegian words, 200KB of Swedish words, 3.3MB of Finnish words, 1MB of Japanese words, 1.1MB of Polish words, 700KB of assorted names, and a very large collection of assorted wordlists covering technical terms, jargon, the Koran, the works of Lewis Carrol, characters, actors, and titles from movies and plays, Monty Python, Star Trek, US politics, US postal areas, the CIA world fact book, the contents of several large standard dictionaries and thesaurii, and common terms from Australian, Chinese, Danish, Dutch, French, German, Italian, Japanese, Norwegian, Polish, Spanish, Swedish, Yiddish, computers, literature, places, religion, and scientific terms. The black.ox.ac.uk site also contains, in the directory /src/security/cracklib25.tar.Z, a word dictionary of around 10MB, stored as a 6MB compressed tar file. A large dictionary of English words which also contains abbreviations, hyphenations, and misspelled words, is available from wocket.vantage.gte.com in the pub/standard_dictionary as dic-0294.tar, an uncompressed 8.9MB file, or dic-0294.tar.Z, a compressed 4.1MB file, and contains around 880,000 entries. In combination with a Markov model for the English language built from commonly-available texts, this wordlist provides a powerful tool for attacking even full passphrases. A Unix password dictionary is available from ftp.spc.edu as .unix/password-dictionary.txt. Grady Ward <grady@netcom.com> has collected very large collections of words, phrases, and other items suitable for dictionary attacks on cryptosystems. Even the NSA has used his lists in their work. Other Software -------------- There are a small number of other programs available which claim to provide disk security of the kind provided by SFS. However by and large these tend to use badly or incorrectly implemented algorithms, or algorithms which are known to offer very little security. One such example is Norton's Diskreet, which encrypts disks using either a fast proprietary cipher or the US Data Encryption Standard (DES). The fast proprietary cipher is very simple to break, and offers protection only against a casual browser. Certainly anyone with any programming skill won't be stopped for long by a system as simple as this. The more secure DES algorithm is also available in Diskreet, but there are quite a number of implementation errors which greatly reduce the security it should provide. Although accepting a password of up to 40 characters, it then converts this to uppercase-only characters and then reduces the total size to 8 characters of which only a small portion are used for the encryption itself. This leads to a huge reduction in the number of possible encryption keys, so that not only are there a finite (and rather small) total number of possible passwords, there are also a large number of equivalent keys, any of which will decrypt a file (for example a file encrypted with the key 'xxxxxx' can be decrypted with 'xxxxxx', 'xxxxyy', 'yyyyxx', and a large collection of other keys, too many to list here). These fatal flaws mean that a fast dictionary-based attack can be used to check virtually all possible passwords in a matter of hours in a standard PC. In addition the CBC (cipher block chaining) encryption mode used employs a known, fixed initialisation vector (IV), which means that data patterns on the encrypted disk are not hidden by the encryption. Using these two implementation errors, a program can be constructed which will examine a Diskreet-encrypted disk and produce the password used to encrypt it (or at least one of the many, many passwords capable of decrypting it) within moments. These problems are in fact explicitly warned against in any of the documents covering DES and its modes of operation, such as ISO Standards 10116 and 10126-2, US Government FIPS Publication 81, or basic texts like Dennings "Cryptography and Data Security". It appears that the authors of Diskreet never even bothered to read any of the standard texts on encryption to make sure they were doing things right. In addition the Diskreet encryption code appears to be taken from a separate library rather than being written by the people who sell Diskreet, with implementation errors in both the encryption code and the rest of Diskreet. The DES routines used in Da Vinci, a popular groupware product, are similarly poorly implemented. Not only is an 8-character password used directly as the DES key, but the DES encryption method used is the electronic codebook (ECB) mode, whose use is warned against in even the most basic cryptography texts and, in a milder form, in various international encryption standards. For example, Annex A.1 of ISO 10116:1991 states "The ECB mode is in general not recommended". ISO 10126-2:1991 doesn't even mention ECB as being useful for message encryption. The combination of Da Vinci's very regular file structure (which provides an attacker with a large amount of known data in very file), the weak ECB encryption mode, and the extremely limited password range, makes a precomputed dictionary attack (which involves a single lookup in a pre-set table of plaintext-ciphertext pairs) very easy. As ECB mode has no pattern hiding ability whatsoever, all that is necessary is to encrypt a common pattern (such as a string of spaces) with all possible dictionary password values, and sort and store the result in a table. Any password in the dictionary can then be broken just as fast as the value can be read out of the table. PC Tools is another example of a software package which offers highly insecure encryption. The DES implementation used in this package has had the number of rounds reduced from the normal 16 to a mere 2, making it trivial to break on any cheap personal computer. This very weak implementation is distributed despite a wide body of research which documents just how insecure 2-round DES really is. Even a correctly-implemented and applied DES encryption system offers only marginal security against a determined attacker. It has long been rumoured that certain government agencies and large corporations (and, no doubt, criminal organizations) possessed specialized hardware which allowed them to break the DES encryption. However only in August of 1993 have complete constructional details for such a device been published. This device, for which the budget version can be built for around $100,000, can find a DES key in 3.5 hours for the somewhat more ambitious $1 million version (the budget version takes 1 1/2 days to perform the same task). The speed of this device scales linearly with cost, so that the time taken can be reduced to minutes or even seconds if enough money is invested. This is a one-off cost, and once a DES-breaking machine of this type is built it can sit there day and night churning out a new DES key every few minutes, hours, or days (depending on the budget of the attacker). In the 1980's, the East German company Robotron manufactured hundreds of thousands of DES chips for the former Soviet Union. This means one of two things: Either the Soviet Union used the chips to build a DES cracker, or they used DES to encrypt their own communications, which means that the NSA built one. The only way around the problem of fast DES crackers is to run DES more than once over the data to be encrypted, using so-called triple DES (using DES twice is as easy to attack as single DES, so in practice three iterations must be used). DES is inherently slow. Triple DES is three times as slow. A hard drive which performs like a large-capacity floppy drive may give users a sense of security, but won't do much for their patience. The continued use of DES, mainly in the US, has been more due to a lack of any replacement than to an ongoing belief in its security. The National Bureau of Standards (now National Institute of Standards and Technology) has only relucatantly re-certified DES for further use every five years. Interestingly enough, the Australian government, which recently developed its own replacement for DES called SENECA, now rates DES as being "inappropriate for protecting government and privacy information" (this includes things like taxation information and social security and other personal data). Now that an alternative is available, the Australian government seems unwilling to even certify DES for information given under an "in confidence" classification, which is a relatively low security rating. Finally, the add-on "encryption" capabilities offered by general software packages are usually laughable. Various programs exist which will automatically break the "encryption" offered by software such as Arc, Arj, Lotus 123, Lotus Symphony, Microsoft Excel, Microsoft Word, Paradox, Pkzip 1.x, the "improved encryption" in Pkzip 2.x, Quattro Pro, Unix crypt(1), Wordperfect 5.x and ealier, the "improved" encryption in Wordperfect 6.x, and many others. Indeed, these systems are often so simple to break that at least one package which does so adds several delay loops simply to make it look as if there were actually some work involved in the process. Although the manuals for these programs make claims such as "If you forget the password, there is absolutely no way to retrieve the document", the "encryption" used can often be broken with such time-honoured tools as a piece of paper, a pencil, and a small amount of thought. Some programs which offer "password protection security" don't even try to perform any encryption, but simply do a password check to allow access to the data. Two examples of this are Stacker and Fastback, both of which can either have their code patched or have a few bytes of data changed to ignore any password check before granting access to data. Data Security ------------- This section presents an overview of a range of security problems which are, in general, outside the reach of SFS. These include relatively simple problems such as not-quite-deleted files and general computer security, through to sophisticated electronic monitoring and surveillance of a location in order to recover confidential data or encryption keys. The coverage is by no means complete, and anyone seriously concerned about the possibility of such an attack should consult a qualified security expert for further advice. Information Leakage There are several ways in which information can leak from an encrypted SFS volume onto other media. The simplest kind is in the form of temporary files maintained by application software and operating systems, which are usually stored in a specific location and which, when recovered, may contain file fragments or entire files from an encrypted volume. This is true not only for the traditional word processors, spreadsheets, editors, graphics packages, and so on which create temporary files on disk in which to save data, but also for operating systems such as OS/2, Windows NT, and Unix, which reserve a special area of a disk to store data which is swapped in and out of memory when more room is needed. This information is usually deleted by the application after use, so that the user isn't even aware that it exists. Unfortunately "deletion" usually consists of setting a flag which indicates that the file has been deleted, rather than overwriting the data in any secure way. Any information which is "deleted" in this manner can be trivially recovered using a wide variety of tools. In the case of a swap file there is no explicit deletion as the swap area is invisible to the user anyway. In a lightly-loaded system, data may linger in a swap area for a considerable amount of time. The only real solution to this is to redirect all temporary files and swap files either to an encryption volume or to a RAM disk whose contents will be lost when power is removed. Most programs allow this redirection, either as part of the program configuration options or by setting the TMP or TEMP environment variables to point to the encrypted volume or RAM disk. Unfortunately moving the swap area and temporary files to an encrypted volume results in a slowdown in speed as all data must now be encrypted. One of the basic premises behind swapping data to disk is that very fast disk access is available. By slowing down the speed of swapping, the overall speed of the system (once swapping becomes necessary) is reduced. However once a system starts swapping there is a significant slowdown anyway (with or without encryption), so the decision as to whether the swap file should be encrypted or not is left to the individual user. The other major form of information leakage with encrypted volumes is when backing up encrypted data. Currently there is no generally available secure backup software (the few applications which offer "security" features are ridiculously easy to circumvent), so that all data stored on an encrypted volume will be backed up in unencrypted form. Like the decision on where to store temporary data and swap files, this is a tradeoff between security and convenience. If it were possible to back up an encrypted volume in its encrypted form, the entire volume would have to be backed up as one solid block every time a backup was made. This could mean a daily five-hundred-megabyte backup instead of only the half megabyte which has changed recently. Incremental backups would be impossible. Backing up or restoring individual files would be impossible. Any data loss or errors in the middle of a large encrypted block could be catastrophic (in fact the encryption method used in SFS has been carefully selected to ensure that even a single encrypted data bit changed by an attacker will be noticeable when the data is decrypted[1]). Since SFS volumes in their encrypted form are usually invisible to the operating system anyway, the only way in which an encrypted volume can be backed up is by accessing it through the SFS driver, which means the data is stored in its unencrypted form. This has the advantage of allowing standard backup software and schedules to be used, and the disadvantage of making the unencrypted data available to anyone who has access to the backups. User discretion is advised. If it is absolutely essential that backups be encrypted, and the time (and storage space) is available to back up an entire encrypted volume, then the Rawdisk 1.1 device driver written by Juergen Prang <prang@du9ds4.fb9dv.uni-duisburg.de> may be used to make the entire encrypted SFS volume appear as a file on a DOS drive which can be backed up using standard DOS backup software. The instructions which come with Rawdisk give details on setting the driver up to allow non-DOS volumes to be backed up as standard DOS drives. The SFS volume will appear as a single enormous file RAWDISK.DAT which entirely fills the DOS volume. Footnote [1]: This is not a serious limitation, since it will only affect deliberate changes in the data. Any accidental corruption due to disk errors will result in the drive hardware reporting the whole sector the data is on as being unreadable. If the data is deliberately changed, the sector will be readable without errors, but won't be able to be decrypted. Eavesdropping The simplest form of eavesdropping consists of directly overwiewing the system on which confidential data is being processed. The easiest defence is to ensure that no direct line-of-sight path exists from devices such as computer monitors and printers to any location from which an eavesdropper can view the equipment in question. Copying of documents and the contents of computer monitors is generally possible at up to around 100 metres (300 feet) with relatively unsophisticated equipment, but is technically possible at greater distances. The possibility of monitoring from locations such as office-corridor windows and nearby rooms should also be considered. This problem is particularly acute in open-plan offices and homes. The next simplest form of eavesdropping is remote eavesdropping, which does not require access to the building but uses techniques for information collection at a distance. The techniques used include taking advantage of open windows or other noise conveying ducts such as air conditioning and chimneys, using long-range directional microphones, and using equipment capable of sensing vibrations from surfaces such as windows which are modulated by sound from the room they enclose. By recording the sound of keystrokes when a password or sensitive data is entered, an attacker can later recreate the password or data, given either access to the keyboard itself or enough recorded keystrokes to reconstruct the individual key sound patterns. Similar attacks are possible with some output devices such as impact printers. Another form of eavesdropping involves the exploitation of existing equipment such as telephones and intercoms for audio monitoring purposes. In general any device which handles audio signals and which can allow speech or other sounds to be transmitted from the place of interest, which can be modified to perform this task, or which can be used as a host to conceal a monitoring device and provide power and possibly microphone and transmission capabilites to it (such as, for example, a radio) can be the target for an attacker. These devices can include closed-circuit television systems (which can allow direct overviewing of confidential information displayed on monitors and printers), office communication systems such as public address systems, telephones, and intercoms (which can either be used directly or modified to transmit sound from the location to be monitored), radios and televisions (which can be easily adapted to act as transmitters and which already contain power supplies, speakers (to act as microphones), and antennae), and general electrical and electronic equipment which can harbour a range of electronic eavesdropping devices and feed them with their own power. Another eavesdropping possibility is the recovery of information from hardcopy and printing equipment. The simplest form of this consists of searching through discarded printouts and other rubbish for information. Even shredding a document offers only momentary protection against a determined enough attacker. The recovery of text from the one-pass ribbon used in high-quality impact printers is trivial. Recovery of text from multipass ribbons is also possible, albeit with somewhat more difficulty. Recovery of the last few pages printed on a laser printer may also be possible. Possibly the ultimate form of eavesdropping currently available, usually referred to as TEMPEST (or occasionally van Eck) monitoring, consists of monitoring the signals generated by all electrically-powered equipment. These signals can be radiated in the same way as standard radio and television transmissions, or conducted along wiring or other metal work. Some of these signals will be related to information being processed by the equipment, and can be easily intercepted (even at a significant distance) and used to reconstruct the information in question. TEMPEST monitoring is usually relatively expensive in resources, difficult to mount, and unpredictable in outcome. It is most likely to be carried out where other methods of eavesdropping are impractical and where general security measures are effective in stopping monitoring. However, once in place, the amount of information available through this form of eavesdropping is immense. In general it allows the almost complete recovery of all data being processed by a certain device such as a monitor or printer, almost undetectably, and over a long period of time. Protection against TEMPEST monitoring is difficult and expensive, and is best left to computer security experts. Trojan Horses It may be possible for an attacker to replace the SFS software with a copy which seems to be identical but which has major weaknesses in it which make an attack much easier, for example by using only a few characters of the password to encrypt the disk. The least likely target is mksfs, since changing the way this operates would require a similar change to mountsfs and the SFS driver which would be easily detectable by comparing them with and independant, original copy. Since a changed mksfs would require the long-term use of a similarly changed mountsfs and driver, the changes of detection are greatly increased. A much more subtle attack involves changing mountsfs. By substituting a version which saves the user's password or encryption key to an unused portion of the disk and then replaces itself with an unmodified, original copy, an attacker can return at their leisure and read the password or key off the disk, with the user none the wiser that their encryption key has been compromised. The SFS driver may be modified to do this as well, although the task is slighly more difficult than changing mountsfs. Detecting this type of attack is very difficult, since although it is possible to use security software which detects changes, this itself might be modified to give a false reading. Software which checks the checking software may in turn be modified, and so on ad infinitum. In general someone who is determined enough can plant an undetectable trojan[1], although precautions like using modification-detection programs, keeping physically separate copies of the SFS software, and occasionally checking the installed versions against other, original copies, may help reduce the risk somewhat. The eventual creation of a hardware SFS encryption card will reduce the risk further, although it is still possible for an attacker to substitute their own fake encryption card. Another possibility is the creation of a program unrealted to SFS which monitors the BIOS character write routines for the printing of the password prompt, or video RAM for the appearance of the prompt, or the BIOS keyboard handler, or any number of other possibilities, and then reads the password as it is typed in. This is a generic attack against all types of encryption software, and doesn't rely on a compromised copy of the software itself. The only real way to defeat this is to use custom hardware which performs its task before any user software has time to run, such as the hardware SFS version currently under development. Footnote [1]: An attacker could employ, for example, what David Farber has described as "supplemental functionality in the keyboard driver". Dangers of Encryption The use of very secure encryption is not without its downsides, however. Making the data completely inaccessible to anyone but the holder of the correct password can be hazardous if the data being protected consists of essential information such as the business records for a company which are needed in its day-to-day operation. If the holder of the encryption password is killed in an accident (or even just rendered unconscious for a time), the potential complete loss of all business records is a serious concern. Another problem is the question of who the holder of the password(s) should be. If the system administrator at a particular site routinely encrypts all the data held there for security purposes, then later access to the entire encrypted dataset is dependant on the administrator, who may forget the password, or die suddenly, or move on to another job and have little incentive to inform their previous employer of the encryption password (for example if they were fired from the previous job). Although there are (as yet) no known cases of this happening, it could occur that the ex-administrator has forgotten the password used at his previous place of employment and might require a small, five-figure consideration to help jog his memory. The difficulty in prosecuting such a case would be rather high, as proving that the encryption system wasn't really installed in good faith by the well-intentioned administrator to protect the company data and that the password wasn't genuinely forgotten would be well nigh impossible. Politics -------- Many governments throughout the world have an unofficial policy on cryptography which is to reserve all knowledge and use of encryption to the government in general and the elite in particular. This means that encryption is to be used firstly (in the form of restrictions on its use) for intelligence-gathering, and secondly for protecting the secret communications of the government. Therefore the government uses encryption to protect its own dealings, but denies its citizens the right to use it to protect their own privacy, and denies companies the right to use it to protect their business data. This policy is enforced in many ways. In the US it is mainly through the use of the ITAR, the International Traffic in Arms Regulations, a law dating back to the second world war and passed with no public debate. This defines all encryption material (hardware and software) as "munitions", subject to special governmental control. These "munitions" in fact have no violent uses (unless, say, you tried beating someone to death with a box of disks). Their only possible use is to protect your personal privacy, and the privacy of your business data. In limiting the use (and export) of encryption technology, the US (and many other countries which follow the lead of the US) are not only denying their citizens the means to ensure the privacy of personal information stored on computer, they are also placing businesses at risk. With no easy way to protect their data, companies are losing billions of dollars a year to industrial espionage which even a simple measure like SFS would help to reduce. Some real-life examples of what the lack of secure encryption can do: - The head of the French DGSE (Direction Generale de la Securite Exterieure) secret service has publicly boasted that his organisation, through industrial espionage, helped French companies acquire over a billion dollars worth of business deals from foreign competitors[1][2]. - A US company was being consistently underbid by a Japanese competitor which was intercepting their electronic traffic. When they started encrypting their messages, the underbidding stopped. A few days later they were requested by the US government to stop using encryption in their traffic[3]. - A New Zealand computer dealer acquired 90 used disks which turned out to contain sensitive financial records from a large bank. The evening after he turned them over to the bank (for a $15,000 cash "finders fee") he was killed in a road accident. The finders fee was never recovered [4]. Despite this major security hole, the bank wouldn't learn from their mistakes. A few weeks later a large networked disk drive originally used by them was supplied to another company as a supposedly "new" replacement for a drive which had died. This drive was found to contain the complete financial records from one of their branches. Not wanting to be scraped off the side of the road one night, the system manager decided to erase the contents of the disk[5]. It isn't known how many more of their confidential financial records this bank has handed out to the public over the years. In the latter case the lack of encryption not only had the potential to cause serious financial harm to the bank involved but resulted in the death of the main player. The use of a program like SFS would have made the contents of the disks unreadable to anyone but the bank. In 1991 the US Justice Department tried to introduce legislation that would require all US digital communications systems to be reengineered (at enormous cost) to support monitoring of message traffic by the FBI. This measure was never passed into law. The next year the FBI tried to introduce a similar measure, but could find no-one willing to back the bill. In April 1993, the US government announced a piece of hardware called the Clipper Chip. They proposed that this device, whose details are classified and which contains a self-destruct mechanism which is activated if attempts are made to examine it too closely, be built into all telephones. Each chip has a special serial number which identifies all communications carried out with that phone. At the beginning of each transmission, telephones equipped with a Clipper Chip negotiate a connection which involves sending identifying information across the phone network, and setting up a common key to use for encrypting the conversation. Built into this setup is a special back door which allows the government, and anyone who works for the government, and anyone who has a friend who works for the government, and anyone with enough money or force to bribe or coerce the aforementioned people, to monitor the conversation. The job is made much easier by the extra identification information which the Clipper Chip attaches to the data stream. The Clipper Chip allows monitoring on a scale even George Orwell couldn't have imagined when he wrote his novel "1984"[6]. A somewhat less blatant attempt to bypass communications privacy is gradually appearing in the rest of the world. The GSM digital telephone system uses a special encryption algorithm called A5X which is a modified form of a stronger system called A5. A5X exists solely as a deliberately crippled A5, and is almost trivial to bypass for electronic monitoring purposes. Although the details of A5 are classified "for national security purposes", various sources have commented that even the original unmodified A5 probably provides only limited security against a determined attack, and the actual implementation exhibits some fundamental flaws (such as a 3-hour key rollover) which can greatly aid an attacker. It is against this worrying background that SFS was created. Right from the start, the idea behind SFS was to provide the strongest possible cryptographic security. No compromises were made, there are no back doors or weaknesses designed into the system, nor will there ever be the deliberate crippling of the system or undermining of its integrity which some organizations would like. The algorithms and methods used in SFS have been selected specifically for their acknowledged strength and general acceptance by the worldwide cryptographic community, and conform to a wide variety of national and international standards for secure encryption. As new algorithms and cryptographic processes appear, SFS will be updated to always provide the best possible security available. Footnote [1] This was reported by cryptographer Martin Hellman at the 1993 RSA Data Security conference on 14-15 January 1993. Footnote [2] Some quotes from FBI Director Louis Freeh from a talk given to the Executives' Club of Chicago on 17 March 1994: "A nation's power is increasingly measured by economic prosperity at home and competitiveness abroad. And in some ways, the United States is a sitting duck for countries and individuals who want to take a short cut to power". [At least 20 nations are] "actively engaged in economic espionage". "This kind of information [cost and price structure, research and development results, marketing plans, bids and customer lists] can be intercepted from fax and satellite communications. It can be monitored from cellular and microwave telephone links. It can be retrieved from inadequately protected computer systems". Footnote [3]: Private communications from one of the people involved. Footnote [4]: This event received nationwide TV, radio, and newspaper coverage at the time. For example, any New Zealand paper published on 7 September 1992 should provide more information. Footnote [5]: Private communications from the system manager involved. Footnote [6]: It has been claimed that the Clipper proposal is an example of the government servicing the people in the sense of the term used in the sentence "The farmer got a bull to service his cows". Security Analysis ----------------- This section attempts to analyse some of the possible failure modes of SFS and indicate which strategies have been used to minimise problems. Incorrect Encryption Algorithm Implementations When implementing something as complex as most encryption algorithms, it is very simple to make a minor mistake which greatly weakens the implementation. It is a well-known fact that making even the smallest change to the DES algorithm reduces its strength considerably. There is a body of test data available as US National Bureau of Standards (NBS) special publication 500-20 which can be used to validate DES implementations. Unfortunately the programmers who produce many commercial DES implementations either don't know it exists or have never bothered using it to verify their code (see the section "Other Software" above), leading to the distribution of programs which perform some sort of encryption which is probably quite close to DES but which nevertheless has none of the security of the real DES. In order to avoid this problem, the SHS code used in SFS has a self-test feature which can be used to test conformance with the data given in Federal Information Processing Standards (FIPS) publication 180, which is the specification for the SHS algorithm. This self-test can be invoked in the mksfs program by giving it the option `-t' for `test': mksfs -t mountsfs and the SFS driver itself use exactly the same code, so testing it in mksfs is enough to ensure correctness of the other two programs. The following tests can take a minute or so to run to completion. The self-test, when run, produces the following output: Running SHS test 1 ... passed, result=0164B8A914CD2A5E74C4F7FF082C4D97F1EDF880 Running SHS test 2 ... passed, result=D2516EE1ACFA5BAF33DFC1C471E438449EF134C8 Running SHS test 3 ... passed, result=3232AFFA48628A26653B5AAA44541FD90D690603 The test values can be compared for correctness with the values given in Appendix 1 of the FIPS publication. If any of the tests fail, mksfs will exit with an error message. Otherwise it will perform a speed test and display a message along the lines of: Testing speed for 10MB data... done. Time = 31 seconds, 323 kbytes/second All SHS tests passed Note that the speed given in this test is only vaguely representative of the actual speed, as the test code used slows the implementation down somewhat. In practice the overall speed should be higher than the figure given. Weak passwords Despite the best efforts of security specialists to educate users about the need to chose good keys, people persist in using very weak passwords to protect critical data. SFS attempts to ameliorate this problem by forcing a minimum key length of 10 characters and making a few simple checks for insecure passwords such as single words (since the number of words of length 10 or more characters is rather small, it would allow a very fast dictionary check on all possible values). The checking is only rudimentary, but in combination with the minimum password length should succeed in weeding out most weak passwords. Another possible option is to force a key to contain at least one punctuation character, or at least one digit as some Unix systems do. Unfortunately this tends to lead people to simply append a single punctuation character or a fixed digit to the end of an existing key, with little increase in security. More password issues are discussed in the section "The Care and Feeding of Passwords" above. Data left in program memory by SFS programs Various SFS utilities make use of critical keying information which can be used to access an SFS volume. Great care has been taken to ensure that all critical information stored by these programs is erased from memory at the earliest possible moment. All encryption-related information is stored in static buffers which are accessed through pointers passed to various routines, and is overwritten as soon as it is no longer needed. All programs take great care to acquire keying information from the user at the last possible moment, and destroy this information as soon as either the disk volume has been encrypted or the keying information has been passed to the SFS driver. In addition, they install default exit handlers on startup which systematically erase all data areas used by the program, both for normal exits and for error or special-condition exits such as the user interrupting the programs execution. Data left in program memory by the SFS driver The SFS driver, in order to transparently encrypt and decrypt a volume, must store at all times the keying information needed to encrypt and decrypt the volume. It is essential that this information be destroyed as soon as the encrypted volume is unmounted. SFS does this by erasing all encryption-related information held by the driver as soon as it receives an unmount command. In addition, the driver's use of a unique disk key for each disk ensures that even if a complete running SFS system is captured by an opponent, only the key for the volume currently mounted will be compromised, even if several volumes are encrypted with the same user password (see the section "Controlled Disclosure of Encrypted Information" below for more details on this). Data left in system buffers by mksfs mksfs must, when encrypting a volume, read and write data via standard system disk routines. This data, consisting of raw disk sectors in either plaintext or encrypted form, can linger inside system buffers and operating system or hard disk caches for some time afterwards. However since none of the information is critical (the plaintext was available anyway moments before mksfs was run, and at worst knowledge of the plaintext form of a disk sector leads to a known plaintext attack, which is possible anyway with the highly regular disk layout used by most operating systems), and since accessing any of this information is highly nontrivial, this is not regarded as a weakness. Data left in system buffers by mountsfs As part of its normal operation, mountsfs must pass certain keying information to the SFS driver through a DOS system call. DOS itself does not copy the information, but simply passes a pointer to it to the SFS driver. After the driver has been initialised, mountsfs destroys the information as outlined above. This is the only time any keying information is passed outside the control of mountsfs, and the value is only passed by reference. Data left in system buffers by the SFS driver Like mksfs, the SFS driver reads and writes data via standard system disk routines. This data, consisting of raw disk sectors in either plaintext or encrypted form, can linger inside system buffers and operating system or hard disk caches for some time afterwards. Once the encrypted volume is unmounted, it is essential that any plaintext information still held in system buffers be destroyed. In order to accomplish this, the SFS driver, when issued with an unmount command, performs two actions intended to destroy any buffered information. First, it issues a cache flush command followed by a cache reset command to any DOS drive cacheing software it recognizes, such as older versions of Microsoft's SmartDrive (with IOCTL commands), newer versions of SmartDrive (version 4.0 and up), the PCTools 5.x cache, Central Point Software's PC-Cache 6.0 and up, the more recent PC-Cache 8.x, Norton Utilities' NCache-F, NCache-S, and the newer NCache 6.0 and up, the Super PC-Kwik cache 3.0 and up, and Qualitas' QCache 4.0 and up. Some other cacheing software can be detected but doesn't support external cache flushing. This step is handled by mountsfs rather than the driver due to the rather complex nature of the procedures necessary to handle the large variety of cacheing software, and the fact that most cacheing software can't be accessed from a device drvier. After this the SFS driver itself issues a disk reset command which has the effect of flushing all buffered and cached data scheduled to be written to a disk, and of marking those cache and buffer entries as being available for immediate use. In addition to the explicit flushing performed by the mountsfs program, many cacheing programs will recognise this as a signal to flush their internal buffers (quite apart from the automatic flushing performed by the operating system and the drive controller). Any subsequent disk accesses will therefore overwrite any data still held in the cache and system buffers. While this does not provide a complete guarantee that the data has gone (some software disk caches will only support a cache flush, not a complete cache reset), it is the best which can be achieved without using highly hardware and system-specific code. SFS volumes left mounted It is possible that an SFS volume may be unintentionally left mounted on an unattended system, allowing free access to both the in-memory keying information and the encrypted volume. In order to lessen this problem somewhat, a fast-unmount hotkey has been incorporated into the SFS driver which allows an unmount command to be issued instantly from the keyboard (see the sections "Mounting an SFS Volume" and "Advanced SFS Driver Options" above). The ease of use of this command (a single keystroke) is intended to encourage the unmounting of encrypted volumes as soon as they are no longer needed, rather than whenever the system is next powered down. As an extra precaution, the driver's use of a unique disk key for each disk ensures that even if a complete running SFS system with an encrypted volume still mounted is captured by an opponent, only the key for the volume currently mounted will be compromised, even if several volumes are encrypted with the same user password (see the section "Controlled Disclosure of Encrypted Information" below for more details on this). Finally, a facility for an automatic timed unmount for volumes left mounted is provided, so that volumes mistakenly left mounted while the system is unattended may be automatically unmounted after a given period of time. This ensures that, when the inevitable distractions occur, encrypted volumes are safely unmounted at some point rather than being left indefinitely accessible to anyone with access to the system. Controlled disclosure of encrypted information To date there are no known laws which can be used to enforce disclosure of encrypted information, a field which is usually covered by safeguards against self-incrimination. However there have been moves in both the US and the UK to pass legislation which would compromise the integrity of encrypted information, or which would remove protection against self-incrimination. In either case this would allow agencies to compel users of cryptographic software to reveal the very information they are trying to protect, often without the users even being aware that their privacy is being compromised. The approach taken in the US, in the form of the Clipper initiative, is to have all encryption keys held by the government. If disclosure of the information is required, the key is retrieved from storage and used to decrypt the information. A side effect of this is that any data which has ever encrypted with the key, and any data which will ever be encrypted in the future, has now been rendered unsafe. This system is best viewed as uncontrolled disclosure. SFS includes a built-in mechanism for controlled disclosure so that, if disclosure is ever required by law, only the information for which access is authorized may be revealed. All other encrypted data remains as secure as it was previously. This is achieved by encrypting each disk volume with a unique disk key which is completely unrelated to the users passphrase. When the passphrase is entered, it is transformed by iterating a one-way hash function over it several hundred times to create an intermediate key which is used to decrypt the disk key itself. The disk volume is then en/decrypted with the disk key rather than the unmodified encyrption key supplied by the user. There is no correlation between the user/intermediate key and the disk key If disclosure of the encrypted information is required, the disk key can be revealed without compromising the security of the user key (since it is unrelated to the user key), or the integrity of any other data encrypted with the user key (since the disk key is unique for each disk volume, so that knowledge of a particular disk key allows only the one volume it corresponds to to be decrypted). The controlled disclosure option is handled by the two mountsfs options `-d' and `-c'. The `-d' option will disclose the unique key for a given disk volume, and the `-c' option will accept this key instead of the usual password. For example to disclose the disk key for the SFS volume "Data", the command would be: mountsfs -d vol=data mountsfs will then ask for the password in the usual manner, and then prompt: You are about to disclose the encryption key for this SFS volume. Are you sure you want to do this [y/n] At this point a response of 'Y' will continue and a response of 'N' will exit the program without disclosing the disk key. A response of 'Y' will print the disk key in the following format: Disk key is 2D 96 B1 06 6D E8 30 21 D5 DB 1B CC 3E 0E C7 CF D6 D1 0C 97 75 DC 06 74 BC 2F E6 A0 3C 56 80 5F 0D 30 DA 54 D3 0D 28 F3 14 DE 79 67 6E 1A 75 DC 33 87 86 29 BA A5 B1 64 5B 79 67 8C 8A 1B D1 27 5B 79 73 6B 45 7B DA 54 43 6C C1 AB 06 67 7E 94 86 F2 50 22 09 8D 21 D5 E7 3A 80 5F 0D 30 DA 54 D3 0D 28 FF 6D E8 30 21 33 87 86 29 C0 ED 4A 22 96 B1 06 6D E8 30 21 D5 1C 2D E6 A0 3C 56 DA 54 43 6C 56 80 This is the unique key needed to access the particular encrypted volume. To use this key instead of the usual one with mountsfs, the command would be: mountsfs -c vol=data mountsfs will then ask for this disk key instead of the usual password: Please enter 264-digit disk key, [ESC] to exit: At this point the disk key should be entered. mountsfs will automatically format the data as it is entered to conform to the above layout. The ESC key can be used at any point to exit the process. Once the key has been entered, the program will perform a validity check on the key. If this test fails, mountsfs will display the message: Error: Incorrect disk key entered and exit. Otherwise it will mount the volume as usual. Since each volume has its own unique disk key, revealing the key for one volume gives access to that volume and no others. Even if 20 volumes are all encrypted with the same user password, only one volume can be accessed using a given disk key. Password Lifetimes and Scope An SFS password which is used over a long period of time makes a very tempting target for an attacker. The longer a particular password is used, the greater the chance that it is compromised. A password used for a year has a far greater chance of being compromised than one used for a day. If a password is used for a long period of time, the temptation for an attacker to spend the effort necessary to break it is far greater than if the password is only a short-term one. The scope of a password is also important. If a password is used to encrypt a single drive containing business correspondence, it's compromise is only mildly serious. If it is employed to protect dozens of disk volumes or a large file server holding considerable amounts of confidential information, the compromise of the password could be devastating. Again, the temptation to attack the master password for an entire file server is far greater than for the password protecting data contained on a single floppy disk. SFS attacks this problem in two ways. First, it uses unique disk keys to protect each SFS volume. The disk key is a 1024-bit cryptographically strong random key generated from nondeterministic values acquired from system hardware, system software, and user input (see the subsection "Generating Random Numbers" below). The data on each disk volume is encrypted using this unique disk key, which is stored in encrypted form in the SFS volume header (this works somewhat like session keys in PEM or PGP, except that a conventional-key algorithm is used instead of a public-key one). To access the disk, the encrypted disk key is first decrypted with the user password, and the disk key itself is used to access the disk. This denies an attacker any known plaintext to work with (as the plaintext consists of the random disk key). To check whether a given password is valid, an attacker must use it to decrypt the disk key, rekey the encryption system with the decrypted disk key, and try to decrypt the disk data. Only then will they know that whether their password guess was correct or not. This moves the target of an attack from a (possibly simple) user password to a 1024-bit random disk key. The other way in which SFS tries to ameliorate the problem of password lifetimes and scope is by making the changing of a password a very simple operation. Since the only thing which needs to be changed when a password change is made is the encrypted disk key, the entire password change operation can be made in a matter of seconds, rather than the many minutes it would take to decrypt and re-encrypt an entire disk. It is hoped that the ease with which passwords can be changed will encourage the frequent changing of passwords by users. An Introduction to Encryption Systems ------------------------------------- For space reasons the following introduction to encryption systems is very brief. Anyone requiring more in-depth coverage is urged to consult the texts mentioned in the references at the end of this document. Encryption algorithms (ciphers) are generally one of two types, block ciphers and stream ciphers. A block cipher takes a block of plaintext and converts the entire block into ciphertext. A stream cipher takes a single bit or byte of plaintext at a time and converts it into ciphertext. There also exist means of converting block ciphers to stream ciphers, and vice versa. Usually a stream cipher is preferred, as it doesn't require data to be quantised to the block size of the cipher in use. Unfortunately, stream ciphers, although more convenient, are usually harder to get right than block ciphers. Many practical stream ciphers are in fact simply block ciphers pretending to be stream ciphers. Virtually all good conventional-key ciphers are so-called product ciphers, in which several (relatively) weak transformations such as substitution, transposition, modular addition/multiplication, and linear transformation are iterated over a piece of data, gaining more and more strength with each iteration (usually referred to as a round). These types of ciphers have been extensively studied and are reasonably well understood. The following table compares the main parameters of several product ciphers. Lucifer is the immediate precursor to the US Data Encryption Standard (DES). Loki is a proposed alternative to DES. FEAL is a fast block cipher designed in Japan. IDEA is a relatively new Swiss block cipher which has been proposed as a successor to DES and which has (so far) proven more resistant to attack then DES. MDC/SHS is a cipher based on the SHS one-way hash function (more on this later). +-----------+------------+----------+-----------+---------------+ | Cipher | Block size | Key size | Number of | Complexity of | | | | (bits) | rounds | Best Attack | +-----------+------------+----------+-----------+---------------+ | Lucifer | 128 | 128 | 16 | 2^21 | | DES | 64 | 56 | 16 | 2^43 | | Loki91 | 64 | 64 | 16 | 2^48 | | FEAL-8 | 64 | 128 | 8 | 10,000 | | IDEA | 64 | 128 | 8 | 2^128 | | MDC/SHS | 160 | 512 | 80 | 2^512 | +-----------+------------+----------+-----------+---------------+ The complexity of the best known attack is the number of operations necessary to allow the cipher to be broken. Note how the block size, key size, and number of rounds don't necessarily give a good indication of how secure the algorithm itself is. Lucifer, although it has twice the block size and over twice the key size of DES, is rather simple to break (the key size of DES is discussed later on in the section on insecurities). DES is the result of several years of work improving on Lucifer. FEAL has been continually changed every year or so when the previous version was broken. Due to this, current versions are treated with some scepticism. Both IDEA and MDC have so far resisted all forms of attack, although recently a class of weak keys have been discovered in IDEA (and a simple change in the algorithm will eliminate these weak keys). Note that in the case of the last two algorithms the given complexity is for a brute-force attack (explained below), which is the most pessimistic kind possible. There may be much better attacks available, although if anyone knows of any they're not saying anything. Of the algorithms listed above, DES has been attacked the hardest, and IDEA and MDC the least, which may go some way to explaining the fact that brute force is the best known attack. There are a large number of modes of operation in which these block ciphers can be used. The simplest is the electronic codebook (ECB) mode, in which the data to be encrypted is broken up into seperate subblocks which correspond to the size of the block cipher being used, and each subblock is encrypted individually. Unfortunately ECB has a number of weaknesses (some of which are outlined below), and would never be used in a well-designed cryptosystem. Using P[] to denote a plaintext block, C[] to denote a ciphertext block, e() to denote encryption, d() to denote decryption, and ^ for the exclusive-or operation, ECB mode encryption can be given as: C[ n ] = e( P[ n ] ) with decryption being: P[ n ] = d( C[ n ] ) A better encryption mode is cipher block chaining (CBC), in which the first data subblock is exclusive-ored with an initialization vector (IV) and then encrypted. The resulting ciphertext is exclusive-ored with the next data subblock, and the result encrypted. This process is repeated until all the data has been encrypted. Because the ciphertext form of each subblock is a function of the IV and all preceding subblocks, many of the problems inherent in the ECB encryption mode are avoided. CBC-mode encryption is: C[ 1 ] = e( P[ 1 ] ^ IV ) C[ n ] = e( P[ n ] ^ C[ n-1 ] ) and decryption is: P[ 1 ] = d( C[ 1 ] ) ^ IV P[ n ] = d( C[ n ] ) ^ C[ n-1 ] Another encryption mode is cipher feedback (CFB), in which the IV is encrypted and then exclusive-ored with the first data subblock to provide the ciphertext. The resulting ciphertext is then encrypted and exclusive-ored with the next data subblock to provide the next ciphertext block. This process is repeated until all the data has been encrypted. Because the ciphertext form of each subblock is a function of the IV and all preceding subblocks (as is also the case for CBC-mode encryption), many of the problems inherent in the ECB encryption mode are avoided. CFB-mode encryption is: C[ 1 ] = P[ 1 ] ^ e( IV ) C[ n ] = P[ n ] ^ e( C[ n-1 ] ) and decryption is: P[ 1 ] = C[ 1 ] ^ e( IV ) P[ n ] = C[ n ] ^ e( C[ n-1 ] ) There are several other modes of operation which are not covered here. More details can be found in the texts given in the references. One point worth noting is that by using a different IV for each message in CBC and CFB mode, the ciphertext will be different each time, even if the same piece of data is encrypted with the same key. This can't be done in ECB mode, and is one of its many weaknesses. There are several standard types of attack which can be performed on a cryptosystem. The most restricted of these is a ciphertext-only attack, in which the contents of the message are unknown. This kind of attack virtually never occurs, as there is always some regularity or known data in the message which can be exploited by an attacker. This leads to the next kind of attack, the known-plaintext attack. In this case some (or all) of the plaintext of the message is known. This type of attack is fairly easy to mount, since most data consists of well-known, fixed-format messages containing standard headers, a fixed layout, or data conforming to a certain probability distribution such as ASCII text. Finally, in a chosen-plaintext attack the attacker is able to select plaintext and obtain the corresponding ciphertext. This attack is also moderately easy to mount, since it simply involves fooling the victim into transmitting a message or encrypting a piece of data chosen by the attacker. This kind of attack was used to help break the Japanese "Purple" cipher during WWII by including in a news release a certain piece of information which it was known the Japanese would encrypt and transmit to their superiors. However attacks of this kind are usually entirely unnecessary. Too many cryptosystems in everyday use today are very easy to break, either because the algorithms themselves are weak, because the implementations are incorrect, or because the way they are used is incorrect. Often amateurs think they can design secure systems, and are not aware of what an expert cryptanalyst could do. Sometimes there is insufficient motivation for anybody to invest the work needed to produce a secure system. Many implementations contain flaws which aren't immediately obvious to a non-expert. Some of the possible problems include: - Use of easily-searched keyspaces. Some algorithms depend for their security on the fact that a search of all possible encryption keys (a so-called brute force attack) would take too long to be practical. Or at least, it took too long to be practical when the algorithm was designed. The Unix password encryption algorithm is a prime example of this. The DES key space is another example. Recent research has indicated that the DES was not in fact weakened by having only 56 bits of key material (as has often been claimed), since the inherent strength of the algorithm itself only provides this many bits of security (that is, that increasing the key size would have no effect since other attacks which don't involve knowing the key can be used to break the encryption in far less than 2^56 operations). The encryption used in the Pkzip archiver can usually be broken automatically in less time than it takes to type the password in for authorized access to the data since, although it allows longer keys than DES, it makes the check for valid decryption keys exceedingly easy for an attacker. - Use of insecure algorithms designed by amateurs. This covers the algorithms used in the majority of commercial database, spreadsheet, and wordprocessing programs such as Lotus 123, Lotus Symphony, Microsoft Excel, Microsoft Word, Paradox, Quattro Pro, WordPerfect, and many others. These systems are so simple to break that the author of at least one package which does so added several delay loops to his code simply to make it look as if there was actually some work involved. - Use of insecure algorithms designed by "experts". An example is the standard Unix crypt command, which is an implementation of a rotor machine much like the German Enigma cipher which was broken during WWII. There is a program called cbw (for `crypt breakers workbench') which can automatically decrypt data encrypted with crypt. After the war, the US government even sold Enigma cipher machines to third-world governments without telling them that they knew how to break this form of encryption. - Use of incorrectly-implemented algorithms. Some encryption programs use the DES algorithm, which consists of a series of complicated and arbitrary- seeming bit transformations controlled by complex lookup tables. These transformations and tables are very easy to get wrong. A well-known fact about the DES algorithm is that even the slightest deviation from the correct implementation significantly weakens the algorithm itself. In other words any implementation which doesn't conform 100% to the standard may encrypt and decrypt data perfectly, but is in practice rather easier to break than the real thing. The US National Bureau of Standards (now the National Institute of Standards and Technology) provides a reference standard for DES encryption. A disappointingly large number of commercial implementations fail this test. - Use of badly-implemented algorithms. This is another problem which besets many DES implementations. DES can be used in several modes of operation, some of them better than others. The simplest of these is the Electronic Codebook (ECB) mode, in which a block of data is broken up into seperate subblocks which correspond to the unit of data which DES can encrypt or decrypt in one operation, and each subblock is then encrypted seperately. There also exist other modes such as CBC in which one block of encrypted data is chained to the next (such that the ciphertext block n depends not only on the corresponding plaintext but also on all preceding ciphertext blocks 0..n-1), and CFB, which is a means of converting a block cipher to a stream cipher with similar chaining properties. There are several forms of attack which can be used when an encrypted message consists of a large number of completely independant message blocks. It is often possible to identify by inspection repeated blocks of data, which may correspond to patterns like long strings of spaces in text. This can be used as the basis for a known-plaintext attack. ECB mode is also open to so-called message modification attacks. Lets assume that Bob asks his bank to deposit $10,000 in account number 12-3456-789012-3. The bank encrypts the message `Deposit $10,000 in account number 12-3456-789012-3' and sends it to its central office. Encrypted in ECB mode this looks as follows: E( Deposit $10,000 in acct. number 12-3456-789012-3 ) Bob intercepts this message, and records it. The encrypted message looks as follows: H+2nx/GHEKgvldSbqGQHbrUfotYFtUk6gS4CpMIuH7e2MPZCe Later on in the day, he intercepts the following a message: H+2nx/GHEKgvldSbqGQHbrUfotYFtUk61Pts2LtOHa8oaNWpj Since each block of text is completely independant of any surrounding block, he can simply insert the blocks corresponding to his account number: ................................gS4CpMIuH7e2MPZCe in place of the existing blocks, and thus alter the encrypted data without any knowledge of the encryption key used. Bob has since gone on to early retirement in his new Hawaiian villa. ECB mode, and the more secure modes such as CBC and CFB are described in several standards. Some of these standards make a reference to the insecurity of ECB mode, and recommend the use of the stronger CBC or CFB modes. Usually implementors stop reading at the section on ECB, with the result being that many commercial packages which use DES and which do manage to get it correct end up using it in ECB mode. - Protocol errors. Even if a more secure encryption mode such as CBC or CFB mode is used, there can still be problems. If a standard message format (such as the one shown above) is used, modification is still possible, except that now instead of changing individual parts of a message the entire message must be altered, since each piece of data is dependant on all previous parts. This can be avoided by prepending a random initialisation vector (IV) to each message, which propagates through the message itself to generate a completely different ciphertext each time the same message is encrypted. The use of the salt in Unix password encryption is an example of an IV, although the range of only 4096 values is too small to provide real security. In some ways, cryptography is like pharmaceuticals. Its integrity is important. Bad penicillin looks just the same as good penicillin. Determining whether most software is correct or not is simple - just look at the output. However the ciphertext produced by a weak encryption algorithm looks no different from the ciphertext produced by a strong algorithm.... until an opponent starts using your supposedly secure data against you, or you find your money transfers are ending up in an account somewhere in Switzerland, or financing Hawaiian villas. Design Details -------------- This section goes into a few of the more obscure details not covered in the section on security analysis, such as the encryption algorithm used by SFS, the generation of random numbers, the handling of initialization vectors (IV's), and a brief overview on the deletion of sensitive information retained in memory after a program has terminated (this is covered in more detail in the section "Security Analysis" above). The Encryption Algorithm used in SFS Great care must be taken when choosing an encryption algorithm for use in security software. For example, the standard Unix crypt(1) command is based on a software implementation of a rotor machine encryption device of the kind which was broken by mathematicians using pencil and paper in the late 1930's. Indeed, there exists a program called `crypt breaker's workbench' which allows the automated breaking of data encrypted using the crypt(1) command. The insecurity of various other programs has been mentioned elsewhere. It is therefore imperative that a reliable encryption system, based on world-wide security standards, and easily verifiable by consulting these standards, be used. When a block cipher is used as a stream cipher by running it in CFB (cipher feedback) mode, there is no need for the cipher's block transformation to be a reversible one as it is only ever run in one direction (generally encrypt-only). Therefore the use of a reversible cipher such as DES or IDEA is unnecessary, and any secure one-way block transformation can be substituted. This fact allows the use of one-way hash functions, which have much larger block sizes (128 or more bits) and key spaces (512 or more bits) than most reversible block ciphers in use today. The transformation involved in a one-way hash function takes an initial hash value H and a data block D, and hashes it to create a new hash value H': hash( H, D ) -> H' or, more specifically, in the function used in SFS: H + hash( D ) -> H' This operation is explained in more detail in FIPS Pub. 180 which defines the Secure Hash Standard. By using H as the data block to be encrypted and D as the key, we can make the output value H' dependant on a user-supplied key. That is, when H is the plaintext, D is the encryption key, and H' is the ciphertext: plaintext H | v +---------+ | SHS |<- key D +---------+ | v ciphertext H' If we regard it as a block cipher, the above becomes: H' = SHS( H ) which is actually: C = e( P ) Since we can only ever "encrypt" using a one-way hash function, we need to run the "cipher" in cipher feedback mode, which doesn't require a reversible encryption algorithm. By the properties of the hash function, it is computationally infeasible to either recover the key D or to control the transformation H -> H' (in other words given a value for H' we cannot predict the H which generated it, and given control over the value H we cannot generate an arbitrary H' from it). The MDC encryption algorithm is a general encryption system which will take any one-way hash function and turn it into a stream cipher running in CFB mode. The recommended one-way hash function for MDC is the Secure Hash Standard as specified in Federal Information Processing Standards (FIPS) Publication 180. SHS is used as the block transformation in a block cipher run in CFB mode as detailed in AS 2805.5.2 section 8 and ISO 10116:1991 section 6, with the two parameters (the size of the feedback and plaintext variables) j and k both being set to the SHS block size of 160 bits. The properties of this mode of operation are given in Appendix A3 of AS 2805.5.2 and Annex A.3 of ISO 10116:1991. The CFB mode of operation is also detailed in a number of other standards such as FIPS Publication 81 and USSR Government Standard GOST 28147-89, Section 4. The use of an initialization vector (IV) is as given in ISO 10126-2:1991 section 4.2, except that the size of the IV is increased to 160 bits from the 48 or 64 bits value given in the standard. This is again detailed in a number of other standards such as GOST 28147-89 Section 3.1.2. The derivation of the IV is given in the section "Encryption Considerations" below. The key setup for the MDC encryption algorithm is performed by running the cipher over the encryption key (in effect encrypting the key with MDC using itself as the key) and using the encrypted value as the new encryption key. This procedure is then repeated a number of times to make a "brute-force" decryption attack more difficult, as per the recommendation in the Public-Key Cryptography Standard (PKCS), part 1. This reduces any input key, even one which contains regular data patterns, to a block of white noise equal in size to the MDC key data. The exact key scheduling process for MDC is as follows: Initialization: - The SHS hash value H is set to the key IV[1]. - The SHS data block D is set to all zeroes. - The key data of length 2048 bits is set to a 16-bit big-endian value containing the length of the user key in bytes, followed by up to 2032 bits of user key. SHS hash value H = key IV; SHS data block D = zeroes; key_data [0:15] = length of user key in bytes; key_data [16:2048 ] = user key, zero-padded; Key schedule: - The following process is iterated a number of times: - The 2048-bit key data block is encrypted using MDC. - Enough of the encrypted key data is copied from the start of the key data block into the SHS data block D to fill it. for i = 1 to 200 do encrypted_key = encrypt( key_data ); D = encrypted_key; During the repeated encryptions, the IV is never reset. This means that the IV from the end of the n-1 th data block is re-injected into the start of the n th data block. After 200 iterations, the "randomness" present in the key has been diffused throughout the entire key data block. Although the full length of the key data block is 2048 bits, the SHS algorithm only uses 512 bits of this (corresponding to the SHS data block D) per iteration. The remaining 1536 bits take part in the computation (by being carried along via the IV) but are not used directly. By current estimates there are around 2^256 atoms in the universe. Compiling a table of all 2^512 possible keys which would be necessary for a brute-force attack on MDC would therefore be a considerable challenge to an attacker, requiring, at least, the creation of another 512 * 2^256 universes to hold all the keys. Even allowing for the current best-case estimate of a creation time of 7 days per universe, the mere creation of storage space for all the keys would take an unimaginably large amount of time. The SFS key schedule operation has been deliberately designed to slow down special hardware implementations, since the encryption algorithm is rekeyed after each iteration. Normal high-speed password-cracking hardware would (for example, with DES) have 16 separate key schedules in a circular buffer, each being applied to a different stage of a 16-stage pipeline (one stage per DES round) allowing a new result to be obtained in every clock cycle once the pipeline is filled. In MDC the key data is reused multiple times during the 80 rounds of SHS, requiring 80 separate key schedules for the same performance as the 16 DES ones. However since the algorithm is rekeyed after every iteration for a total of 200 iterations, this process must either be repeated 200 times (for a corresponding slowdown factor of 200), or the full pipeline must be extended to 16,000 stages to allow the one-result-per-cycle performance which the 16-stage DES pipeline can deliver (assuming the rekeying operation can be performed in a single cycle). Changing the iteration count to a higher value will further slow down this process. The number of iterations of key encryption is controlled by the user, and is generally done some hundreds of times. The setup process in SFS has been tuned to take approximately half a second on a workstation rated at around 15 MIPS (corresponding to 200 iterations of the encryption process), making a brute-force password attack very time-consuming. Note that the key IV is injected at the earliest possible moment in the key schedule rather than at the very end, making the use of a precomputed data attack impossible. The standard method of injecting the encryption IV at the end of the key schedule process offers very little protection against an attack using precomputed data, as it is still possible to precompute the key schedules and simply drop in the encryption IV at the last possible moment. Footnote [1]: Some sources would refer to this value as a `salt'. The term `key IV' is used here as this is probably a more accurate description of its function. Generating Random Numbers One thing which cryptosystems consume in large quantities are random numbers. Not just any old random value, but cryptographically strong random numbers. A cryptographically strong random value is one which cannot be predicted by an attacker (if the attacker can predict the value, which is usually used to set up encryption keys, then they can make a guess at the encryption key itself). This automatically rules out all software means of generating random values, and means specialised hardware must be used. Very few PC's are currently equipped with this type of hardware. However SFS requires 1024 random bits for each encrypted disk, in the form of the disk key (see the section "[Password Lifetimes and Scope]" above). SFS therefore uses a number of sources of random numbers, both ones present in the hardware of the PC and one external source: - Various hardware timers which are read occasionally when the program is running (generally after operations which are long and complex and will be heavily affected by external influences such as interrupts, video, screen, and disk I/O, and other factors. - The contents and status information of the keyboard buffer - Disk driver controller and status information - Mouse data and information - Video controller registers and status information [ - The clock skew between two hardware clocks available on the PC. Due to background system activity such as interrupt servicing, disk activity, and variable-length instruction execution times, these clocks run out-of-phase. SFS uses this phase difference as a source of random numbers]. - The timing of keystrokes when the password is entered. SFS reads the high-speed 1.19 MHz hardware timer after each keystroke and uses the timer values as a source of random numbers. This timer is used both to measure keystroke latency when the password is entered and read at random times during program execution. Trials have shown that this 16-bit counter yields around 8 bits of random information (the exact information content is difficult to gauge as background interrupts, video updates, disk operations, protected-mode supervisor software, and other factors greatly affect any accesses to this counter, but an estimate of 8 bits is probably close enough[1]). - The timing of disk access latency. The exact operation is as follows: Read a timer-based random disk sector Add its contents (8 bits) Read the high-speed 1.19 MHz hardware timer (13 bits) Use the two values for the next random sector This is repeated as often as required (in the case of SFS this is 10 times). Assuming a (currently rather optimistic) maximum of 5ms to acquire a sector this provides about 13 bits of randomness per disk operation. The number of factors which influence this value is rather high, and includes among other things the time it takes the BIOS to process the request, the time it takes the controller to process the request, time to seek to the track on the disk, time to read the data (or, if disk cacheing is used, time for the occasional cache hit), time to send it to the PC, time to switch in and out of protected mode when putting it in the cache, and of course the constant 3-degree background radiation of other interrupts and system activity happening at the same time. If a solid-state disk were being used, the hardware latency would be significantly reduced, but currently virtually no 386-class PC's have solid-state disks (they're reserved for palmtops and the like), so this isn't a major concern. An estimate of the number of random bits available from each source is as follows: Keystroke latency, 8 bits per key 80 bits for minimum 10-char key Second password entry for encryption 80 bits for minimum 10-char key Disk access latency, 13 bits per read 130 bits for 10 reads Disk sector data, 8 bits 80 bits for 10 reads System clocks and timers 3 bits Video controller information 4 bits Keyboard buffer information 4 bits Disk status information 4 bits General system status 4 bits Random high-speed timer reads 120 bits for 15 reads -------- Total 509 bits These figures are very conservative estimates only, and are based on timing experiments with typed-in passwords and a careful scrutiny of the PC's hardware and system status data. In practice (especially with longer passwords) the number of random bits is increased somewhat (for example with a 30-character password the total may be as high as 829 bits of random information). However even the minimal estimate of 509 bits is adequate for the 512-bit key required by MDC. Each quantum of semi-random information is exclusive-ored into a 1024-bit buffer which is initially set to all zeroes. Once 1024 bits of buffer have been filled, the data is encrypted with MDC to distribute the information, and the first 512 bits of the 1024-bit buffer is used as the key for the next MDC encyrption pass. Then more data is added until, again, 1024 bits of buffer have been filled, whereupon the data is again mixed by encrypting it with MDC. This process is repeated several times, with the amount of "randomness" in the buffer increasing with each iteration. Before being used, this data is encrypted 10 times over with MDC to ensure a complete diffusion of randomness. Since the IV for the encryption is reused for the next pass through the buffer, any information from the end of the buffer is thus reinjected at the start of the buffer at the next encryption pass. Although this method of generating random numbers is not perfect, it seems to be the best available using the existing hardware. General estimates of the exact amount of truly random information which can be acquired in this manner are in the vicinity of several hundred bits. However these all assume that an attacker is in possession of what amounts to a complete hardware trace of events on the machine in question. Allowing for a reasonable amount of physical system security, it can be assumed that the random data used in SFS is unpredictable enough to provide an adequate amount of security against all but the most determined attacker. Footnote [1]: If an opponent can obtain several hours of keystroke timings and can come up with a good model including serial correlations, they may be able to reduce the likely inputs to the random number accumulator to a somewhat smaller value, or at least bias their guesses to fall within the range of likely values. Encryption Considerations When a block cipher is converted to handle units of data larger than its intrinsic block size, a number of weaknesses can be introduced, depending on the mode of operation which is chosen for the block cipher. For example, if two identical ciphertext blocks are present in different locations in a file, this may be used to determine the plaintext. If we can find two identical blocks of ciphertext when cipher block chaining (CBC) is used, then we know that: P[ i ] = d( C[ i ] ) ^ C[ i-1 ] P[ j ] = d( C[ j ] ) ^ C[ j-1 ] where C is the ciphertext, P is the plaintext, and e() and d() are encryption and decryption respectively. Now if C[ i ] = C[ j ], then d( C[ i ] ) = d( C[ j ] ), which cancel out when xor'd so that: P[ i ] ^ C[ i-1 ] = P[ j ] ^ C[ j-1 ] or: P[ j ] = P[ i ] ^ C[ i-1 ] ^ C[ j-1 ] Knowing C[ i ] and C[ j ] we can determine P[ i ] and P[ j ], and knowing either P[ i ] or P[ j ] we can determine the other. Something similar holds when cipher feedback (CFB) mode is used, except that now the decryption operation is: P[ i ] = e( C[ i-1 ] ) ^ C[ i ] P[ j ] = e( C[ j-1 ] ) ^ C[ j ] Now if C[ i ] = C[ j ] then e( C[ i ] ) = e( C[ j ] ) (recall that in CFB mode the block cipher is only ever used for encryption), so that they again cancel out, so: P[ i ] ^ e( C[ i-1 ] ) = P[ j ] ^ e( C[ j-1 ] ) or: P[ i ] = P[ j ] ^ e( C[ i-1 ] ) ^ e( C[ j-1 ] ) In general this problem is of little consequence since the probability of finding two equal blocks of ciphertext when using a 160-bit block cipher on a dataset of any practical size is negligible. More esoteric modes of operation such as plaintext feedback (PFB) and ones whose acronyms have more letters than Welsh place names tend to have their own special problems and aren't considered here. The problem does become serious, however, in the case of sector-level encryption, where the initialization vector cannot be varied. Although the IV may be unique for each sector, it remains constant unless special measures such as reserving extra storage for sector IV's which are updated with each sector write are taken. If a sector is read from disk, a small change made to part of it (for example changing a word in a text file), and the sector written out to disk again, several unchanged ciphertext/plaintext pairs will be present, allowing the above attack to be applied. Running a program such as a disk defragmenter will rewrite a large number of sectors while leaving the IV unchanged, allowing an opponent access to large quantities of XOR'd plaintext blocks simply by recording the disk contents before and after the defragmentation process. Normally this problem would be avoided by using a different IV for each encrypted message, but most disk systems don't have the room to store an entire sectors worth of data as well as the IV needed to en/decrypt it. An additional disadvantage of the CFB encryption mode is that the data in the last block of a dataset may be altered by an attacker to give different plaintext without it affecting the rest of the block since the altered ciphertext in the last block never enters the feedback loop. This type of attack requires that an opponent possess at least two copies of the ciphertext, and that they differ only in the contents of the last block. In this case the last ciphertext block from one copy can be subsituted for the last ciphertext block in the other copy, allowing a subtle form of message modification attack. In fact in combination with the previously mentioned weakness of CFB, an attacker can determine the XOR of the plaintexts in the last block and substitute an arbitrary piece of "encrypted" plaintext to replace the existing data. There are several approaches to tackling this problem. The most simplistic one is to permute the plaintext in a key-dependant manner before encryption and after decryption. This solution is unsatisfactory as it simply shuffles the data around without necessarily affecting any particular plaintext or ciphertext block. The desired goal of a change in any part of the plaintext affecting the entire dataset is not achieved. A better solution is to encrypt data twice, once from front to back and then from back to front[1]. The front-to-back pass propagates any dependencies to the back of the dataset, and the back-to-front pass propagates dependencies back to the front again. In this way a single change in any part of the plaintext affects the entire dataset. The disadvantage of this approach is that it at least halves the speed of the encryption, as all data must be encrypted twice. If the encryption is done in software, this may create an unacceptable loss of throughput. Even with hardware assistance there is a noticeable slowdown, as no hardware implementations easily support backwards encryption, requiring the data to be reversed in software before the second pass is possible. The best solution is probably to use a word-wise scrambler polynomial like the one used in SHA. With a block of plaintext P this is: P[ i ] ^= P[ i-K1 ] ^ P[ i-K2 ] with suitable values for the constants K1 and K2. If K2 is chosen to be 5 (the SHA block size in words) then the initial values of the 5 words (which can be thought of as as P[ -5 ]...P[ -1 ]) are simply the sectorIV. The value of K1 is arbitrary, SFS uses a value of 4. This technique is used by first setting the initial values of the 5 words to the sectorIV. The scrambler function is then run over the entire data block, propagating all dependencies to the last 5 words in the block. These last 5 words are then used as the IV for the CFB encryption of the entire block. In this way the encryption IV depends on all the bits in the block, and the scrambling does a moderately good job of breaking up statistical patterns in the plaintext. No information is lost, so no randomness in the sectorIV is misplaced. This also provides resistance to the selective-modification attack which allows an attacker to change selected bits in the last block of a CFB-encrypted dataset without damage. By destroying the IV used in the CFB encryption, the first block is completely corrupted, which is unlikely to go unnoticed. To decrypt a dataset encrypted in this manner, the first 5 words of ciphertext are shifted into the feedback path, and the remainder of the dataset is decrypted in the standard manner. The last 5 decrypted words are then used as the IV to decrypt the first encrypted block. Finally, the scrambler is run over the recovered plaintext to undo the changes made during the encryption scrambling. The overall en/decryption process used by SFS, in the case of 512-byte sectors and 32-bit words (so that each sector contains 128 words), is: Encryption: using sectorIV[ 0 ]...sectorIV[ 4 ] as the scrambler IV scramble data[ 0 ]...data[ 127 ] using data[ 127-5 ]...data[ 127-1 ] as the encryption IV encrypt data[ 0 ]...data[ 127 ] Decryption: using data[ 0 ]...data[ 4 ] as the encryption IV decrypt data[ 5 ]...data[ 127 ] using data[ 127-5 ]...data[ 127-1 ] as the encryption IV decrypt data[ 0 ]...data[ 4 ] using sectorIV[ 0 ]...sectorIV[ 4 ] as the scrambler IV scramble data[ 0 ]...data[ 127 ] where the scrambling operation is: data[ i ] ^= data[ i-4 ] ^ data[ i-5 ] as outlined above. Note that the i-4 and i-5 th values referred to here are the original, scrambled values, not the descrambled values. The easiest way to implement this is to cache the last 5 scrambled values and cyclically overwrite them as each word in the data buffer is processed. Footnote [1]: To be precise, you need some sort of feedback from the end of a block on the first encryption pass to the start of the block on the next encryption pass. A block can be encrypted forwards twice as long as the IV is wrapped around back to the start of the block for the second encryption pass. A Discussion of the MDC Encryption Algorithm[1] (A word on notation: The notation {0,1}^k is used to mean the set of all bit strings of length k, and {0,1}* means the set of all bit strings, including the empty string. Any message can be viewed as a bit string by means of a suitable encoding.) The encryption method used by SFS is somewhat unusual. In some respects it is similar to Merkle's "Meta Method" for obtaining cryptosystems. The method relies on the existence of secure one-way hash functions. A hash function is a function that takes as input an arbitrary number of bits and produces a fixed-sized output called the "message digest". In other words, hash functions have the form h : {0,1}* --> {0,1}^k for some fixed k, and the hash of a message M is defined to be h( M ). A secure one-way hash function is a hash function with the following properties: 1. For each message M, it is easy to compute h( M ), 2. Given M, it is computationally infeasible to compute M' with h( M ) = h( M' ) (secure against forgery), and 3. It is computationally infeasible to compute M and M' with h( M ) = h( M' ) (secure against collisions). (For a good, although rather technical discussion of hash functions, or indeed cryptography in general, see "Contemporary Cryptology. The Science of Information Integrity" edited by Gustavus Simmons, IEEE Press, 1992 (ISBN 0-87942-277-7)). The terms "easy to compute" and "infeasible to compute" can be made precise, but we'll settle for this informal terminology anyway. Roughly, "easy to compute" means that it will take a tolerable amount of time to compute the answer, even on a rather small machine; "infeasible to compute" means that it should take eons to find out a particular result, even when using millions of computers of the fastest conceivable technology in parallel. Examples of hash functions include the MD2, MD4, and MD5 hash functions, developed by Ron Rivest of RSA Data Security, Inc., which have been (at least in the case of MD4 and MD5) placed in the public domain, and the Secure Hash Standard SHS, developed by NIST (with significant input from the NSA). The existence of secure one-way hash functions has not been proved, although there exist some strong candidates, including MD5 and SHS. The reference implementations of the above hashing functions include one interesting aspect which makes it possible to use them as encryption functions. Since the hashing of a very large amount of data in one sweep is not desirable (because all the data would have to be in memory at the time of hashing), most hashing algorithms allow you to hash data incrementally. This is made possible by augmenting the definition of a hash function to include the state of the last hashing operation. In other words, a hash function now has the form h : {0,1}^k x {0,1}* --> {0,1}^k, where the first argument is the previous hash value, and the hash of a message M = ( M1, M2, ..., Mn ) is defined to be h( h( ...( h( h0, M1 ), M2 ), ... ), Mn ). (The value of all the h evaluations must not change if the message is broken up into blocks of different length, but all of the previously mentioned hash functions have that property). Here, h0 is a fixed, known initial value that is used in all hashing calculations. This is not the way "real" hash functions behave, but it is close enough. For example, the MD5 hashing function has "initialization", "updating", and "finalization" parts, where the finalization part appends the number of hashed bytes to the message, hashes one final time, and returns the final hash value. This means that the hashing "context" must include the number of bytes hashed so far, without it being a part of the hash value. The hash function can be said to have "memory". If we assume that h is a secure one-way hashing function, we can now use such an h as a scrambling device. For example, if we set E( M ) = h( h0, M ) for every message M, M will not be recoverable from E( M ), because h is secure by definition. Another method would be to supply M to any standard MSDOS or UNIX utility and use the resulting error message as the ciphertext. (Remember, a computer is a device for turning input into error messages). However, there are still two problems to be solved before we can use hash functions as encryption functions: 1. The scrambling process is not controlled by a key, and 2. The scrambling process is not invertible, so there is no way to decrypt the ciphertext. Both problems can be solved by interchanging the roles of hash and data and by using CFB mode in the encryption process. In other words, let K be an arbitrarily long key, let M = ( M1, ..., Mn ) be a message, broken up in chunks of k bits, let IV be an initialization vector, and set C1 = M1 xor h( IV, K ) Ci = Mi xor h( C( i-1 ), K ) for 1 < i <= n. This is sent to the recipient, who easily recovers the plaintext by P1 = C1 xor h( IV, K ) Pi = Ci xor h( C( i-1 ), K ) for 1 < i <= n, since we have P1 = ( M1 xor h( IV, K ) ) xor h( IV, K ) = M1 xor ( h( IV, K ) xor h( IV,K ) ) because xor is associative, = M1 xor 0, because x xor x = 0, = M1, because x xor 0 = x, and similarly for the Pi's. This method of encryption also offers more security than using ECB mode, assuming that this were possible with hash functions, since the plaintext is diffused over the entire ciphertext, destroying plaintext statistics, and thus foiling straightforward ciphertext correlation attacks. This method can clearly be used for any hash function which can hash incrementally. Thus, it is a "Meta Method" for turning hash functions into encryption functions. This is called the Message Digest Cipher (MDC) method of encryption. Specific instances of the method have the name of the hash function added as a suffix. For example, the MDC method applied to the MD5 hash function would be referred to as MDC/MD5. SFS uses MDC/SHS. Footnote [1]: This analysis was contributed by Stephan Neuhaus, <neuhaus@informatik.uni-kl.de> Deletion of SFS Volumes Truly deleting data from magnetic media is very difficult. The problem lies in the fact that when data is written to the medium, the write head sets the polarity of most, but not all, of the magnetic domains. This is partially due to the inability of the writing device to write in exactly the same location each time, and partially due to the variations in media sensitivity and field strength over time and among devices. In general terms, when a one is written to disk, the media records a one, and when a zero is written, the media records a zero. However the actual effect is closer to obtaining something like 0.95 when a zero is overwritten with a one, and 1.05 when a one is overwritten with a one. Normal disk circuitry is set up so that both these values are read as ones, but using specialized circuitry it is possible to work out what previous `layers' contained (in fact on some systems it may be possible to recover previous data with a simple software modification to the hardware control code). This problem is further complicated by the fact that the heads might not pass exactly over the same track position when data is rewritten, leaving a trace of the old data still intact. Current-generation drives reduce this problem somewhat as track and linear densities have now become so high that the traditional optical methods of extracting information from the disk platters has become much more difficult, and in some cases impossible, as the linear bit cell is below the optical diffraction limit for visible light. While some data patterns can still be discerned, recognizing others would be limited to some subset of patterns. Despite this small respite, when all the above factors are combined it turns out that each track on a piece of media contains an image of everything ever written to it, but that the contribution from each `layer' gets progressively smaller the further back it was made. Using the same signal-processing technology which is used to clean up satellite images, low-level recorded speech, and other data, it is possible to recover previous data with a remarkable degree of accuracy back to a level limited only by the sensitivity of the equipment and the amount of expertise of the organisation attempting the recovery. Intelligence organisations have a *lot* of expertise in this field. The basic concept behind the overwriting scheme used by SFS is to flip each magnetic domain on the disk back and forth as much as possible (this is the basic idea behind degaussing - magnetic media are limited in their ability to store high-frequency oscillations, so we try to flip the bits as rapidly as possible). This means that the disk head should be run at the highest possible frequency, and the same pattern should not be written twice in a row. If the data was encoded directly, that would mean a an alternating pattern of ones and zeroes. However, disks always use a NRZI encoding scheme in which a 1 bit signifies an inversion, making the desired pattern a series of one bits. This leads to a further complication as all disks use some form of run-length limited (RLL) encoding, so that the adjacent ones won't be written. This encoding is used so that transitions aren't placed too closely together, or too far apart, which would mean the drive would lose track of where it was in the data. The basic limitation on disks is the proximity of 1 bits. Floppies (which are a more primitive form of the technology used in hard disks) like to keep the 1 bits 4us apart. However they can't be kept too far apart or the read clock loses synchronisation. This "too far" figure depends a lot on the technology in the drive, it doesn't depend on the magnetic media much (unlike the "too close" figure, which depends a lot on the media involved). The first single-density encoding wrote one user data bit, then one "1" clock bit, taking a total of 8us. This was called FM, since a 1 bit was encoded as two transitions (1 wavelength) per 8 us, while a 0 bit was encoded as one transition (1/2 wavelength). Then it was discovered that it was possible to have two 0 bits between adjacent 1s. The resulting encoding of 4 bits into 5 was called group code recording (GCR) which was (0,2) RLL. This allowed 4 user data bits to be written in 5 * 4us = 20 us, for an effective time per user bit of 5 us, which was a big improvement over 8 us. But GCR encoding was somewhat complex. A different approach was taken in modified FM (MFM), which suppressed the 1 clock bit except between adjacent 0's. Thus, 0000 turned into 0(1)0(1)0(1)0 (where the ()s are the inserted clock bits), 1111 turned into 1(0)1(0)1(0)1, and 1010 turned into 1(0)0(0)1(0)0. The maximum time between 1 bits was now three 0 bits. However, there was at least one 0 bit, so it became possible to clock the bits at 2us/bit and not have more than one 1 bit every 4us. This achieved one user bit per 4 us, a result which was better than GCR and obtainable with simpler circuitry. As can be seen, the effective data rate depends on the bit rate (which has been 4 us, 4 us and 2 us in these examples) and the encoding rate, a measure of the encoding efficiency. The encoding rates have been 1/2, 4/5 and 1/2. There is a (2,7) RLL code with rate 1/2, meaning that 1 user bit goes to 2 encoded bits (although the mapping involves multi-bit groups and is somewhat complex), but there are always at least two 0 bits between 1 bits, so 1 bits happen at most every 3 bit times. This allows the clock to be reduced to 1.3333 us (this 2/1.33333 = 6/4 = 3/2 is the source of the added capacity gain of RLL hard drive controllers over equivalent MFM controllers). The time per user bit is now 2.6666 = 2 2/3 us. However, the faster clock is finicky. It is also possible to use a (1,7) RLL encoding. Since this is (1,x), the clock rate is 2 us/bit, but the encoding efficiency improves from 1/2 to 2/3. This allows 2 effective user bits per 6 us, or 3 us per user bit. For hard drives, it is easier to increase the clock rate by an extra 9/8 than to fiddle with a clock 50% faster, so this is very popular with more recent disk drives. The three most common RLL codes are (1,3) RLL (usually known as MFM), (2,7) RLL (the original "RLL" format), and (1,7) RLL (which is popular in newer drives). The origins of these codes are explained in more details below. Fortunately, each of these three have commonly-used encoding tables. A knowledge of these tables can be used to design overwrite patterns with lots of transitions after being encoded with whatever encoding technique the drive uses. For MFM, the patterns to write to produce this are 0000 and 1111. So a couple of rounds of this should be included in the overwrite pattern. MFM drives are the oldest, lowest-density drives around (this is especially true for the very-low-density floppy drives). As such, they're the easiest to recover data from with modern equipment and we need to take the most care with them. For (1,7) RLL, the patterns to write are 0011 and 1100, or 0x33 and 0xCC when expressed as bytes. To provide some security against bit misalignment, the values 0x99 and 0x66 should be included as well (although drive manufacturers like to keep things byte-aligned, so this sort of bit misalignment is unlikely to happen). For (2,7) RLL drives, three patterns are necessary, and the problem of byte endianness question rears its head. The previous two cases are not significantly affected by shifting the bytes around, but this one is. Fortunately, thanks to the strong influence of IBM mainframe drives, everything seems to be uniformly big-endian within bytes (that is, the most significant bit is written to the disk first). For (2,7) RLL using the standard tables: 10 -> 0100 11 -> 1000 000 -> 00100 010 -> 100100 011 -> 001000 0010 -> 00100100 0011 -> 00001000 the bit patterns 100100100..., 010010010... and 001001001... will encode into the maximum-frequency encoded strings 010010010010... (0x49, 0x24, 0x92), 100100100100... (0x92, 0x49, 0x24), and 001001001001.... (0x24, 0x92, 0x49). Writing all three of these patterns will cover all the bases. For the latest crop of high-density drives which use methods like Partial- Response Maximum-Likelihood (PRML) methods which may be roughly equated to the trellis encoding done by V.32 modems in that it is effective but computationally expansive)m all we can do is write a variety of random patterns, because the processing inside the drive is too complex to second-guess. Fortunately, these drives push the limits of the magnetic media much more than the old MFM drives ever did by encoding data with much smaller magnetic domains, closer to the physical capacity of the magnetic media. If these drives require sophisticated signal processing just to read the most recently written data, reading overwritten layers is also correspondingly more difficult. A good scrubbing with random data will do about as well as can be expected. To deal with all these types of drives in one overwrite pattern, SFS uses the following sequence of 30 consecutive writes to erase data: 1. Random 2. 0000..., MFM encoded to 01010101... 3. 1111..., MFM encoded to 10101010... 4. Random 5. 010010..., (2,7) RLL encoded to 100100100100... 6. 100100..., (2,7) RLL encoded to 010010010010... 7. 001001..., (2,7) RLL encoded to 001001001001... 8. Random 9. 00110011..., (1,7) RLL encoded to 010101010101... 10. 11001100..., (1,7) RLL encoded to 101010101010... 11. 01100110..., (1,7) misaligned RLL encoded to 010101010101... 12. 10011001..., (1,7) misaligned RLL encoded to 101010101010... 13. Random 14. 1111..., MFM encoded to 10101010... 15. 0000..., MFM encoded to 01010101... 16. Random 17. 001001..., (2,7) RLL encoded to 001001001001... 18. 100100..., (2,7) RLL encoded to 010010010010... 19. 010010..., (2,7) RLL encoded to 100100100100... 20. Random 21. 11001100..., (1,7) RLL encoded to 101010101010... 22. 00110011..., (1,7) RLL encoded to 010101010101... 23. 10011001..., (1,7) misaligned RLL encoded to 101010101010... 24. 01100110..., (1,7) misaligned RLL encoded to 010101010101... 25. Random 26. 0000..., MFM encoded to 01010101... 27. 1111..., MFM encoded to 10101010... 28. Random 29. Random 30. Random All patterns are repeated twice, once in each order, and MFM is repeated three times, because MFM drives are generally the lowest density, and thus particularly easy to examine. There is a commonly-held belief that there is a US government standard for declassifying magnetic media which involves overwriting it three times. In fact this method is for declassifying core (computer memory) rather than magnetic media. The government standard for declassifying magnetic media probably involves concentrated acid, furnaces, belt sanders, or any combination of the above. This is necessary not only because data recovery from the actual tracks itself may (with some effort) be possible, but because of factors like like intelligent drives which keep so-called "alternative cylinders" on a disk free to allow dynamic re-allocation of data to one of these tracks in case a disk block develops errors. The original block is no longer accessible through any sort of normal access mechanism, and the data on it can't be destroyed without going through some unusual contortions which tend to be highly hardware-dependant. Similarly, drives conforming to some of the more high-level protocols such as SCSI are relatively free to interpret commands sent to them in whichever way they choose (as long as they still conform to the SCSI specification). Thus some drives, if sent a SCSI Format Media command may return immediately without performing any action, may simply perform a read test on the entire disk (the most common option), or may actually write data to the disk[1]. This is rather common among newer drives which can't directly be low-level formatted, unlike older ST-412 and ST-506 MFM or RLL drives. For example trying to format an IDE drive generally has little effect - a low-level format generally isn't possible, and the standard DOS `format' command simply writes a boot record, FAT, and root directory, and returns. Therefore if the data is very sensitive and is stored on floppy disk, it can best be destroyed by removing the media from the disk liner and burning it. Disks are relatively cheap, and can easily be replaced. Permanently destroying data on fixed disks is less simple, but the multiple-overwrite option used by SFS at least makes it quite challenging (and expensive) to recover any information. Footnote [1]: Again it may be possible to bypass this using highly hardware- specific methods. For example Quantum SCSI drives manufactured a few years ago could be forced to write data to disk during a format by changing the sector filler byte before the format command was issued.